Creating an MTT Treebank of Spanish
نویسندگان
چکیده
We present a cost effective strategy for the creation of a mid-size fine-grained dependency treebank of surfaceand deep-syntactic structures as defined in the Meaning-Text Theory for Spanish. The strategy starts from a small seed dependency corpus, the AnCora corpus, whose annotation is considerably more coarse-grained than our target annotation. We show that this discrepancy can be bridged largely by automatic means, relying upon contextual information and leaving thus minimal work to the annotators. This allows us to develop the resources with limited human effort within a limited period of time. We also propose a preliminary evaluation of the actual amount of work that the annotation process requires.
منابع مشابه
Interactive Predictive Parsing Framework for the Spanish Language
The Interactive Predictive Parsing (IPP) framework allows us the construction of interactive tree annotation systems. These can help human annotators in creating error-free parse trees with little effort (compared to manually post-editing the trees obtained from a completely automatic parser). In this paper we adapt the IPP framework and the IPP-Ann annotation tool for parse of the Spanish lang...
متن کاملA tree is a Baum is an árbol is a sach'a: Creating a trilingual treebank
This paper describes the process of constructing a trilingual parallel treebank. While for two of the involved languages, Spanish and German, there are already corpora with well-established annotation schemes available, this is not the case with the third language: Cuzco Quechua (ISO 639-3:quz), a low-resourced, non-standardized language for which we had to define a linguistically plausible ann...
متن کاملA Treebank of Spanish and its Application to Parsing
This paper presents joint research between a Spanish team and an American one on the development and exploitation of a Spanish treebank. Such treebanks for other languages have proven valuable for the development of high-quality parsers and for a wide variety of language studies. However, when the project started, at the end of 1997, there was no syntactically annotated corpus for Spanish. This...
متن کاملExtracting LTAG Grammars from a Spanish Treebank
Treebank grammars have been known to help in building robust, wide-coverage statistical parsers that also obtain state-of-art accuracies. In this work, we present a system that extracts LTAG grammars for Spanish from a constituency-based Spanish treebank. We evaluate the extracted grammar in terms of its size, its coverage on unseen data and the performance of a supertagger trained on it. The s...
متن کاملRevising the METU-Sabancı Turkish Treebank: An Exercise in Surface-Syntactic Annotation of Agglutinative Languages
In this paper, we present a revision of the training set of the METU-Sabancı Turkish syntactic dependency treebank composed of 4997 sentences in accordance with the principles of the Meaning-Text Theory (MTT). MTT reflects the multilayered nature of language by a linguistic model in which each linguistic phenomenon is treated at its corresponding level(s). Our analysis of the METU-Sabancı synta...
متن کامل